Study of integration of statistical model-based voice activity detection and noise suppression
نویسندگان
چکیده
This paper addresses robust front-end processing for automatic speech recognition (ASR) in noisy environments. To recognize the corrupted speech accurately, it is necessary to employ robust methods against various types of interference. Usually, noise suppression (NS) is used for the front-end processing of ASR in noise. Voice activity detection (VAD) is also used for front-end processing to reduce the redundant non-speech period. VAD and NS are typically combined as series processing. However, VAD and NS should not be assumed to be a separate technique, because the output information of these methods be mutually beneficial. Thus, we investigate the integrated front-end processing of VAD and NS, which can utilize each others’ inputoutput information. The evaluation is carried out by using a concatenated speech corpus, CENSREC-1-C. In the evaluation, the proposed method improves ASR accuracy compared with conventional series combination.
منابع مشابه
A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملA study of mutual front-end processing method based on statistical model for noise robust speech recognition
This paper addresses robust front-end processing for automatic speech recognition (ASR) in noise. Accurate recognition of corrupted speech requires noise robust front-end processing, e.g., voice activity detection (VAD) and noise suppression (NS). Typically, VAD and NS are combined as one-way processing, and are developed independently. However, VAD and NS should not be assumed to be independen...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملA statistical model-based voice activity detection using multiple DNNs and noise awareness
In this paper, we propose the ensemble of deep neural networks (DNNs) by using acoustic environment classification for statistical model-based voice activity detection (VAD). Since conventional decision functions for statistical model-based VAD are based on shallow model and it cannot take an advantage of the diversity of the space distribution of features, we present to use the multiple DNNs s...
متن کاملA priori SNR estimation and noise estimation for speech enhancement
A priori signal-to-noise ratio (SNR) estimation and noise estimation are important for speech enhancement. In this paper, a novel modified decision-directed (DD) a priori SNR estimation approach based on single-frequency entropy, named DDBSE, is proposed. DDBSE replaces the fixed weighting factor in the DD approach with an adaptive one calculated according to change of single-frequency entropy....
متن کامل